Information-theoretic bound on the energy cost of stochastic simulation 
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Physical systems are often simulated using a stochastic computation where different final states 
result from identical initial states. Here, we derive the minimum energy cost of simulating a complex 
data set of a general physical system with a stochastic computation. We show that the cost is 
proportional to the difference between two information-theoretic measures of complexity of the data 
- the statistical complexity and the predictive information. We derive the difference as the amount 
of information erased during the computation. Finally, we illustrate the physics of information by 
implementing the stochastic computation as a Gedankenexperiment of a Szilard-type engine. The 
results create a new link between thermodynamics, information theory, and complexity. 
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PACS numbers: 02.50.Ey Stochastic processes, 05. 70. -a Entropy thermodynamics, 89.70.Cf Entropy infor- 
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The idea of physics as information has a long history. 
The concept of entropy, at the heart of information the- 
ory, originated in the theory of thermodynamics. It was 
Maxwell and Boltzmann who, in the beginning of the 19*'* 
century, recognized the intricate link between probabil- 
ity distributions over configurations and thermodynam- 
ics. This laid the foundation to the field of statistical 
mechanics. The similarity between the thermodynamic 
entropy and the information entropy, introduced in 1948 
by Shannon, lead to a whole new perspective on phys- 
ical processes as storing and processing information. It 
also lead to paradoxes such as Maxwell's demon which 
seemed to suggest that work could be generated from 
heat only with the use of information, which would vio- 
late the second law of thermodynamics (for a review, see 
Refs. [mi]). The paradox was solved independently by 
Penrose and Bennett, in considering the entropy creation 
caused by erasing information [TJ [3] . 

With the insight that erasing information generates 
entropy, Zurek found limits on the thermodynamic cost 
of deterministic computation using algorithmic complex- 
ity 0]. A deterministic computation generates a unique 
output given a particular input. Repeated computations 
yield identical results. A stochastic computation on the 
other hand yields different outputs for identical inputs. It 
is a useful descriptor of natural processes which are often 
stochastic and have different final states given 'identi- 
cal' initial states (within a given finite resolution or, in 
quantum mechanics, even infinite resolution). Here, we 
derive the minimum energy cost of simulating a complex 
data set with a stochastic computation. We show that 
it is proportional to the difference between the statisti- 



cal complexity [5] and the predictive information of the 
data 0. We derive this difference as the amount of in- 
formation erased during the computation. Finally, we 
illustrate the physics of information by "implementing" 
the stochastic computation with a Szilard-type engine. 

It has been shown that the difference between these 
two measures arises from an asymmetry in the transport 
of information forward and backward in time [7] . In this 
paper we give a physical explanation of this asymmetry 
together with new mathematical proofs of the relevant 
information theory. 

Consider the following model of stochastic computa- 
tion: A given computational device is in some initial state 
and outputs length- TV strings of symbols according 
to probability distribution Pr(X^). If the distribution 
Pr(X^) is statistically indistinguishable from that of an 
observed data set of a physical system the symbol se- 
quence x^ is a simulation of that system. Where the 
probability distribution Pt{X^) is a very uncompressed 
description, one step away from raw data of an exper- 
iment, the computational device simulating it is a very 
compact description, a summary of the regularities, a 
first step toward a 'theory' explaining an experiment. 
We call the joint probability distribution of past and fu- 
ture observations Pt{X^^^) a stochastic process. The 
provably unique minimal (in terms of information stored) 
and optimal (in terms of prediction) such computation- 
theoretic representation summarising the regularities of 
a stochastic process is a so-called e-machine [51 E] . It con- 
sists of an output alphabet A, a set of causal states S and 
stochastic transitions between them. For every pair of 
states s, s' £ S probabilities Pr(S'i = s'l^i-i = s, Xi = x) 



2 



are defined for going from state s to state s' while out- 
putting symbol x & A. The statistical complexity of a 
process is defined as the Shannon entropy over the sta- 
tionary distribution of its e-machine's causal-states |22) : 

:= HiS) . (1) 

is the number of bits required to specify a particular 
causal state in the e-machine. It is the number of bits 
that need to be stored to optimally predict future data 
points. 

The predictive information of a data set is given by 
the mutual information between the two halves, e.g. the 
past data and the future data [HI IS] : 

B= \hn I[XzIj;X^-']. (2) 

N—¥oo 

where Xzlf and X^~^ are strings of random variables 
representing observations of a stochastic process. Predic- 
tive information is also known under the name of excess 
entropy, effective measure complexity, and stored infor- 
mation (see [S] and refs. therein). For the following ther- 
modynamic development of stochastic computation we 
find the name predictive information most suitable. The 
predictive information, measured in bits, can be inter- 
preted as the average number of bits a process stores at 
a given point in time and "transmits" to the future. It is 
known that 

C^^I] + H{S-i\X^) (3) 

and hence that E < _8., Theorem 5]. Two important 
properties of e-machines of relevance here are that the 
next state given the last state and the current symbol 
is uniquely determined (Eq. |4]) and that the state after 
the observation of a long enough sequence of symbols is 
uniquely determined (Eq. [5]) : 

H^SnISn-iXn) = (deterministic-stochastic) (4) 
lim H{Sn\X^)=0 (synchronising) (5) 

Successful inference of e-machines ranges from dynam- 
ical systems [5], spin systems [10], and crystallographic 
data [TT] to molecular dynamics [T^], atmospheric 
turbulence [T3], and self-organisation [T^ . 

Landauer defines an operation to be logically irre- 
versible if the output of the operation does not uniquely 
define the inputs [15]. In other words, logically irre- 
versible operations erase information about the compu- 
tational device's preceding logical state. Landauer's in- 
sight was that logical information erasure costs energy 
[15j . In the following we discuss how Landauer's principle 
and logical irreversibility apply to stochastic computa- 
tion and, in particular, to the computation of a stochastic 
process. For a given e-machine the current state and the 
next symbol determine the next state uniquely (Eq. ffl) . 



The reverse, however, is not necessarily true. Given the 
current state and last output symbol, the previous state 
is not always uniquely determined. In this case the e- 
machine is logically irreversible. Following Landauer's 
definition of irreversibility, we define the information era- 
sure per computational step of a given e-machine as the 
entropy of the previous state (Si-i) given the current 
state {Si) and last output symbol {Xi) : 

herase '-^ H{Si-i\XiSi), (6) 

For later use, we also define h'^rase '■— H{Si-N\Xl_^Si) 

for Z > TV and H erase ■= liniTV^oo h^rase- Krase Can bc 

calculated from the e-machine directly. It quantifies the 
average irreversibility of a computational step of the e- 
machine. We now show that strict equality between E 
and holds if and only if the e-machine is fully logically 
reversible, i.e. iff h^rase — 0. 

Theorem 1. The predictive information ^ of a stochas- 
tic process is equal to the statistical complexity Cp, if and 
only if the information erasure of the corresponding e- 
machine is zero: 

= E <^ herase = . (7) 

Proof 

From the Markov property of the states Si (i.e. 
H{Xi\S-iXoSo) = H{Xi\XoSo)) it follows that 
H{S-i\XoXiSo) ^ H{S-i\XoSo). Using this recursively 
and the fact that further conditioning never increases the 
entropy we obtain the forward direction 

Cp=E^H{S^i\X^)^0 

^H{S^,\X^So)=0 

^ HiS-i\XoSo)=0 (8) 

Going in reverse we have H{Sn-i\XnSn) = 
^ H{Sn-i\X^Sn) = and in addition 

H{Sn-i\X^Sn) = H{Sn^i\X^) - H{Sn\X^) . (9) 

The second term on the RHS goes to zero in 
the limit iV ^ oo (Eq. [5|. In the same 
way H{SN-k\X^SN-k+i) ^ H{SN-k+i\X^), k = 
2, 3, . . . , TV + 1, as N oo. Setting fc = iV + 1, the 
claim follows. □ 

Note that this result automatically implies that any e- 
machine with herase = can be inverted in time, turning 
it into a retrodicter as introduced in [7] and thus provid- 
ing an immediate and quick construction of a retrodicter 
for this case saving the 'modeler' a second computation- 
ally costly inference procedure (for more on computa- 
tional cost see e.g. [I6]). 

For future reference we call H{Xi\Si) the un- 

certainty of the last symbol given the current state. 
This reverse entropy rate is different from the reverse 
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entropy rate referred to in dynamical systems theory 
(see e.g. [I2])- The comphmentary quantity to 
is the weh-known entropy rate of a stochastic process 
h = H{Xi^i\Si). We can see that the amount of infor- 
mation which can be erased per computational step is 
upper bounded by the amount of information which is 
created per computational step, 



< h 



(10) 



by writing the joint entropy H{Si-iXiSi) as two different 
sums: 

H{S,\S^.iX,) + H{S,-iX,) = H{S^-i\XA) + H{X,S^i) 
^ H{X,\S,-i) + H{S,^i) 

= H{S^-i\X,S,) + H{X,\S,) + H{S,) 

^ = herase + h , (11) 

where, in the second line, we have used Eq. [4] 

Now consider a e-machine A4 with h^rase > 0. 
To derive the difference between E and we con- 
struct an e-machine which outputs more than one 
symbol at a time as follows. For every pair of 
states Si,Sj e 5 of e-machine M we construct the 
TV*'' concatenation /A'^^ with state transition prob- 



nf=2 Pr(sfe|sfe_ia;fe) 



abilities FT{sj\siX ) = X]s""^es"-i Pr(sj|sAr_iXAr) 

Pr(si|siXi) upon outputting x^, 

where the sum runs over all state sequences s^^^ of 
length - 1. Note, that M and M"^^ have the same 
set of states and the same probability distribution over 
output strings Pt{X^) - so they have the same E and 

Theorem 2. The difference between the statistical com- 
plexity and the predictive information E of a stochas- 
tic process approaches the information erased by the cor- 
responding concatenated e-machine Ml®^ as N oo: 



lim 



Using Theorem [2] we can derive the minimum energy 
cost of simulating a stochastic process. Fig. 1 schemat- 
ically illustrates an e-machine contained in a box which 
on consecutive time steps outputs symbols visible to an 
outside observer. The computational steps are as follows. 
In (a) the e-machine in the box is in state Si^i. Going 
to (b) it generates symbol Xi according to the proba- 
bility distribution Pr(Xi|5'i_i), leading to an increase in 
entropy inside the box hy h — H{Xi\Si-i). Going to (c) 
the e-machine moves from state Si-i to state 5*^. Erasure 
of the previous-state information causes a decrease in en- 
tropy inside the box by H{Si-i\SiXi) = h^rase- Finally, 
in (c) the symbol is ejected into the environment which 
decreases the entropy inside the box again, this time by 
H{Xi\Si) — h^. This closes one cycle of computation. 
The entropy contributions during one closed cycle must 
add up to zero and we obtain h — h^rase — — which 
is exactly Eq. [TT] 

We now modify this stochastic computation by al- 
lowing for the generation of TV -|- 1 symbols at a time. 
The e-machine inside the box (Fig. 1) is replaced by 
the concatenated e-machine In (a) this ma- 

chine starts out in state going to (b) it gen- 

erates N -\- I symbols according to Fi{Xl^^ \Si^i). 
This causes an increase in entropy inside the box by 
H{Xl'^^\Si-.i). Going to (c) the concatenated e-machine 
moves to state S'^+at erasing the previous-state infor- 
mation which decreases the entropy inside the box by 
H{Si-i\Xl'^'^ Si+N)- Ejecting the symbol sequence into 
the environment the entropy of the box decreases by 
HiXl+^\S,+N). Setting w .l.o.g. i = 0, we obtain for 
the entropy balance of one computational cycle: 



H{X^'\S^i 



H{S-i\X^Sn) ~ H{X^\Sn) = 



(14) 



The LHS can be rewritten as 



E 



H{S^,) - H{S^,\X^ Sn) - I{S^i;X^) (15) 
Letting — >• oo we obtain 



Proof. By rewriting H{S-iX^ Sn) in two different ways 
we obtain the information-theoretic equality: 

HiS^,\X^) =H{S^,\X^Sn) 

+ H{Sn\xI;') ~ H{Sn\S-iX^) (12) 

The last term is zero due to determinism (Eq. |4|, the 
second term goes to zero for TV — ?> oo. Hence, taking the 
limit we obtain: 



H{S-^\X^) 



H{S^i\X^Sn) as TV ^ oo . (13) 



The term on the right-hand side is exactly 
M®^ . Letting TV — )• oo the claim follows. 



of 

□ 



Heras 



E - = 



(16) 



Hence, the entropy balance of one cycle of stochastic 
computation is in the limit of an infinite string of out- 
put given by Eq. [3] and Theorem [2j We now discuss the 
thermodynamics of such a stochastic computation. 

The Carnot efficiency of one cycle of an engine consist- 
ing of an ideal gas in a cylinder alternately connected to 
a hot and a cold reservoir at temperature Th and Tc, 
respectively, is, as is well known, given by the ratio of 
work output W and absorbed heat Qh- 



77= — = 1-^ 
Qh Th 



(17) 
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FIG. 1: One computational cycle for simulating a stochastic 
process with an e-machine. 



In 1929, Szilard invented a Gedankenexperiment of a 
single-particle engine to resolve Maxwell's demon para- 
dox which seemed to defy the second law of thermody- 
namics [18]. The engine consists of a particle in a box, 
a measurement device locating the particle in either half 
of the box, and a memory to store the measurement re- 
sult. Szilard considered the following procedure for ex- 
tracting work from the particle's thermal motion. A user 
measures the particle's position and stores this one bit 
of information in the memory. Subsequently she 'com- 
presses' the box to the half which contains the particle. 
This does not require any work. The thermal motion of 
the particle 'decompresses' the box again and hence lets 
the user extract work from the box without any cost. 
This apparent paradox is resolved when one factors the 
additional energy required to for the user to reset her 
memory |15] . With the memory initially at temperature 
Th and the reset done at temperature Tc the efficiency 
of this engine is given by the Carnot efficiency (Eq 
and the laws of thermodynamics were restored. 
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This Gedankenexperiment can be extended to a parti- 
cle in one of 2^ possible partitions. Hence, storing the 
measurement result of the particle's position requires E 
bits of memory. Following the same argument as for 
the original Szilard engine we obtain Carnot efficiency 
rj — \ ~ Tc /Th . This modified engine leads us directly 
to a new interpretation of Theorem [2] 

In the simulation of a stochastic process, we attempt to 
generate information about its future based on observa- 
tions of the past. This may be viewed as a Gedankenex- 
periment where we attempt to maximize our knowledge 
about a particle whose position is governed by a random 
variables , Xi , ... which we can only indirectly measure 
by recording appropriate information from a correlated 
random variables X_i,X_2, .... These recorded bits can 
the be translated to information about Xo,Xi, ... by the 
use of an appropriate simulator. To extract the maxi- 
mum possible amount of information, E, the theory of 
e-machines dictates that we must record at least record 



Cfi bits. The minimality and optimality of e-machines 
ensures that any fewer bits would render the simulation 
sub-optimal. This results in a stochastic computation 
that allows to extract 

= fcrH'Eln2 units of extra work. Meanwhile, the 
Cfj, bits stored about the past are erased, which costs 
= kTcC^ In 2 units of energy. The efficiency is, just 
like before, the ratio between output work and absorbed 
heat: 



Of 



= 1 - 



Qf 



= 1 - 



C^Tc 
ETh 



(18) 



We define the information-theoretic efficiency for com- 
puting a stochastic process: 



E 



1 



C,, 



(19) 



Combining the thermodynamic and information theo- 
retic efficiencies we obtain 



H 



-iTh 
Tc 



(20) 



For maximal information-theoretic efficiency we recover 
the thermodynamic efficiency from Eq. |17[ 

E/Cfi has been named the "predictive efficiency of a 
process as the fraction of the information it contains 
which actually effects the future" [12] . Our results supply 
this concept with physicality and mathematical rigour. 

We have derived the minimum energy cost of simu- 
lating a physical system as the difference between two 
information-theoretic complexity measures of the data. 
Of the two measures, the predictive information mea- 
sures the amount of information stored about a process's 
past transmitted to the future, the statistical complexity 
measures the amount of information required to compute 
this future. Any difference between the two is given by 
the amount of information erased during the simulation 
of the data and hence represents the minimum energy 
cost of physically running a simulation. 

This result is complementary to the discussion of 
Crutchfield et al. who derive the difference between the 
two measures from the asymmetry of running the process 
in forward and reverse [7]. We add to this a physical in- 
terpretation of the cost of reversing a computation using 
thermodynamics. The lower bound to the energy cost 
of simulating a physical system was derived for optimal 
classical simulators. Recent results that quantum simu- 
lators require less information storage could indicate that 
quantum information leads to a reduced cost for stochas- 
tic computation |20j . Our results reveal an intricate re- 
lation between thermodynamics, information processing, 
and complexity. They motivate the use of information- 
theoretic tools for studying the physics of complex sys- 
tems. 
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